ETL Raw YouTube Data into Valuable Analytical Information. Visualize.

Checkout the interactive plots and links version in this notebook on Github pages

Abstract:

This notebook uses SQL to transform our raw YouTube data, from the viral playlist, into useful analytical information.

  • The sql query generated below could be used in an ETL for SSIS/SSMS

Prerequisites

  1. Read the video_playlist_ETL notebook

Goals:

Helper Functions

  • H1. Build 3 helper functions to easily search and watch our videos
      1. make_clickable() makes our urls clickable
      1. make_df_url() makes all the urls in the df clickable
      1. findVideo() works like a google search for any video in our analysis
      1. Kick back and watch some videos during the analysis. ***it's almost as fun as our plots***

Extract

Transform

  • T1. Transform raw data in SQL to useful analytical data. Example SSIS ELT job.
    • A. Inner SQL Query builds CTE using LAG() and CASE to generate new count_deltas statistics
    • B. Middle SQL query uses SELECT FROM CTE to create time_delta_min and view_per_min.
      • Ensures clean data WHERE title=last_title.
    • C. Outter SUBQUERY SELECT to pull our clean newly defined deltas stats
  • T2. Build a few new columns from existing data and reset data types

Load (In Development)

  • L1. If using SSMS detail how to build new scheduled job
  • L1. If using AirFlow+Postgre build dag to upload transformed data as a new table.
    • Need to limit data extracted + transformed on most recent load

Visualize

  • V1. Plot viewcount by position in the playlist
    • The top video is HEYYEYAAEYAAAEYAEYAA with 171M views, ~855,000USD in revenue
    • The second most played is the start of the playlist, YEE, with 78M views ~390,000USD in revenue
  • V2. Generate df with average views per day by video
  • V3. Plot view_per_min by day to see if people are still watching
  • V4. Plot averages views of the playlist by hour to see most popular times
  • V5. Clean up dirty data and plot averages playlist view by day again
  • V6. Plot averages views of the playlist by day to see most popular days
    • Surprisingly more people are watching videos midweek, Max = Wednesday, 2.76 view/min
  • V7.Plot averages views of the playlist by day+hour to see most popular times by day
  • V8. Lets check our view_per_min/day distributions with a violin plot
    • Some data is negative, likely a data collection issue from youtube viewcount
In [1]:
from datetime import datetime
import pandas as pd
import math
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
from psycopg2 import sql, connect
import psycopg2
import pandas as pd
import plotly.express as px
from IPython.display import IFrame

#import credentials (database host address) from your youtube_config.py
import sys
sys.path.insert(1, '../../')
from youtube_config import host

Helper Functions

H1. Build 3 helper functions to easily search and watch our videos

H1. Build 3 helper functions to easily search and watch our videos

  1. make_clickable() makes our urls clickable
  2. make_df_url() makes all the urls in the df clickable
  3. findVideo() think of this function like a google search for any video in our analysis
In [2]:
def make_clickable(val):
    """
    Makes hyperlinks into clickable html
    
    val: input url
    """
    return '<a href="{}">{}</a>'.format(val,val)
In [3]:
def make_df_url(df):
    """
    Returns data frame with clickable 'pl_url' columns links 
    
    df: input data frame
    """
    return df.style.format(make_clickable, subset='pl_url')
In [4]:
def findVideo(input_str=None):
    """
    Search any string within video title. Returns most recent stats of videos with search string
    in the title as a dataframe
    
    input_str: input video title string
    """
    input_str = input_str.lower()
    returndf = sql_df[sql_df['title'].str.lower().str.contains(input_str)].sort_values(
                by='datetime', ascending=False).drop_duplicates('title').sort_values(by="title")
    
    return returndf.style.format(make_clickable, subset='pl_url')

Extract

E1. Use psycopg2 to connect to our database

E2. Review raw data set from the video_stats table created in video_stats_hourly_ETL

E1. Use psycopg2 to connect to our database

In [5]:
db_name = "youtube_test"
user = "postgres"
host = host
password = "mypw"

try:
    # declare a new PostgreSQL connection object
    conn = connect(
        dbname = db_name,
        user = user,
        host = host,
        password = password
    )

    # print the connection if successful
    print ("psycopg2 connection:\n", str(conn).split(' ')[5:8], "host=host", str(conn).split(',')[1] )
    cur = conn.cursor()

except Exception as err:
    print ("psycopg2 connect() ERROR:", err)
    conn = None
psycopg2 connection:
 ["'user=postgres", 'password=xxx', 'dbname=youtube_test'] host=host  closed: 0>

E2. Review raw data set from the video_stats table created in video_stats_hourly_ETL

  • We care about the title and these raw numbers in this data set
    • viewcount, likecount, dislikecount: common YouTube stats
    • vid_date, vid_time: retrieval time of stats
    • position: index position of video in the playlist starting with 0.
  • Check out the raw data in the review_df below
In [6]:
try: 
    cur.execute("""
    SELECT * FROM video_stats
    INNER JOIN playlists ON playlists.resourceid=video_stats.resourceid
    LIMIT 5
""")
    
    sql_return = cur.fetchall()
    columns = [column[0] for column in cur.description]
    review_df = pd.DataFrame(sql_return, columns = columns)

except Exception as err:
    print ("psycopg2 connect() ERROR:", err)
    
review_df
Out[6]:
etag viewcount likecount dislikecount commentcount vid_date vid_time resourceid title description pl_url thumbnails videopublishedat channeltitle position resourceid playlistid etag id
0 j-uEhuZGD6-KvUJFDGRVbnf4bGI 78525268 870052 20182 59742 2020-07-30 20:32:41 q6EoRBvdVPQ Yee Dinosauri bisesti dalle voci funeste. Original... https://www.youtube.com/watch?v=q6EoRBvdVPQ&li... https://i.ytimg.com/vi/q6EoRBvdVPQ/hqdefault.jpg 2012-02-29 19:47:08 kierancaspian 0 q6EoRBvdVPQ PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo JFzB7xf4Hx7Ji8Sf6KNAlCvsj50 UExGc1FsZUFXWHNqXzR5RGVlYmlJQURkSDVGTWF5QmlKby...
1 DxrWUOKhCMqrSQTOw8To5v2FuW4 25699381 235640 5525 12551 2020-07-30 20:32:41 8YWl7tDGUPA color red Instagram: @jimmynez https://www.youtube.com/watch?v=8YWl7tDGUPA&li... https://i.ytimg.com/vi/8YWl7tDGUPA/hqdefault.jpg 2014-09-06 03:39:52 kierancaspian 1 8YWl7tDGUPA PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo dZ6i6SJWZcvZlrVQ2QuLDPKWEcU UExGc1FsZUFXWHNqXzR5RGVlYmlJQURkSDVGTWF5QmlKby...
2 d69Rq7ShD1A-rNWO78Uy8ikuzrU 24288518 226146 20042 31401 2020-07-30 20:32:41 6bnanI9jXps Terrible Mall Commercial Now that's a catchy tune! https://www.youtube.com/watch?v=6bnanI9jXps&li... https://i.ytimg.com/vi/6bnanI9jXps/hqdefault.jpg 2014-08-17 20:45:59 kierancaspian 2 6bnanI9jXps PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo B8HpEi2XY612lgHPLEQZBa0VZB8 UExGc1FsZUFXWHNqXzR5RGVlYmlJQURkSDVGTWF5QmlKby...
3 iJh82sFTR8eKTdneQK8F3tlVOjY 22896 327 3 53 2020-07-30 20:32:41 IOzaT_uTNiU [HD] Sandra Annenberg Cai Ao Vivo no Programa ... https://www.youtube.com/watch?v=IOzaT_uTNiU&li... https://i.ytimg.com/vi/IOzaT_uTNiU/hqdefault.jpg 2020-05-09 23:13:15 kierancaspian 3 IOzaT_uTNiU PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo 6gP0PMMV_sYMeVrykUdSZ0RmpCA UExGc1FsZUFXWHNqXzR5RGVlYmlJQURkSDVGTWF5QmlKby...
4 GSh4U_OPd9I6LRBqokRgPyHC-0c 13740598 84867 819 4352 2020-07-30 20:32:41 SBeYzoQPbu8 name a yellow fruit https://www.youtube.com/watch?v=SBeYzoQPbu8&li... https://i.ytimg.com/vi/SBeYzoQPbu8/hqdefault.jpg 2013-10-30 19:20:15 kierancaspian 4 SBeYzoQPbu8 PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo tn0QkHggUIceO1O1rg47GpRRTR0 UExGc1FsZUFXWHNqXzR5RGVlYmlJQURkSDVGTWF5QmlKby...

Transform

  • T1. Transform raw data in SQL to useful analytical data. Example SSIS ELT job.
    • A. Inner SQL Query builds CTE using LAG() and CASE to generate new count_deltas statistics
    • B. Middle SQL query uses SELECT FROM CTE to create time_delta_min and view_per_min.
      • Ensures clean data WHERE title=last_title.
    • C. Outter SUBQUERY SELECT to pull our clean newly defined deltas stats
  • T2. Build a few new columns from existing data and reset data types

T1. Transform raw data in SQL to useful analytical data

  • A. Inner SQL Query builds CTE using LAG() and CASE to generate new count_deltas statistics
  • B. Middle SQL query uses SELECT FROM CTE to create time_delta_min and view_per_min.
    • Ensures clean data WHERE title=last_title.</font>
  • C. Outter SUBQUERY SELECT to pull our clean newly defined deltas stats</font>

This query can be difficult to read outside of a native sql text editor, so I outline the structure here:

SUBQUERY SELECT to pull our newly defined stats ( CTE used to generate new count detlas and cols confirming clean data

      The CTE generates deltas based on current stats of the video minus the second most recent record
        EX: viewcount_delta = viewcount_t1 - viewcount_t0
      The deltas are made by ordering records OVER (ORDER BY title, vid_date, vid_TIME) and subtracting the
      previous record with the LAG() window function
        EX: viewcount_delta = viewcount - LAG(viewcount) OVER (ORDER BY title, vid_date, vid_TIME)
(
LAG previous title to check if it matches current title. Will clean data
WHERE title = last_title.
CASE WHEN record title matches previous record title
                When condition makes sure deltas are only calculated between the same column title.
                Prevents edge cases where the previous title rolls over to a new title.
THEN subtract stats to make deltas.
END
JOIN on another table that has title because our current table does not.
)
SELECT new deltas FROM CTE and create time_delta_min and view_per_min stats WHERE title = last_title ensures all deltas were calc'd between the same title. AND view_delta IS NOT NULL filters out edge cases where title != last_title )

This SQL query was built as an example to use as an SSIS ELT job

  • The transform may have been more efficiecntly done in Python
  • Should be fun to work into our airflow dags</font>
In [7]:
try: 
    cur.execute("""
--subquery to generate final view of stats
SELECT title, pl_url, position, videopublishedat, vid_date,
        viewcount, last_view, view_delta, view_per_min, 
        datetime, last_time, time_delta, time_delta_min, 
        commentcount, commentcount_delta, 
        likecount, likecount_delta,
        dislikecount, dislikecount_delta
FROM 
(
--CTE used to generate stats from viewcount, likecount, etc. The CTE is transforming the raw data into more powerful data
WITH cte AS ( 
        SELECT viewcount, likecount, dislikecount, commentcount, 
                vid_date, vid_time, title, playlists.position, pl_url, videopublishedat,
                
        --check last_title matches current title to ensure data is clean when calc'ing deltas with LAG()
        LAG(title) OVER (ORDER BY title, vid_date, vid_TIME) AS last_title,
        
        --check last_view count to ensure data is clean when calc'ing view_delta with LAG()
        LAG(viewcount) OVER (ORDER BY title, vid_date, vid_TIME) AS last_view,
        
        --check last_time date to ensure data is clean when calc'ing time_delta with LAG()
        LAG(vid_date + vid_time) OVER (ORDER BY title, vid_date, vid_TIME ) AS last_time,
        
        --create datetime column combining vid_date and vid_time cols
        (vid_date + vid_time) AS datetime,
        
        --CASE statements to calculate deltas of each statistic
        CASE 
        /*When title = LAG(title) ensures the following delta corresponds to videos with the same title
        This is needed because the records ORDER BY title, vid_date, vid_TIME
        So when the ordered records proceed to the next title, you don't care about the delta for A_title_views-B_title_views.
        */
        When title = LAG(title) OVER (ORDER BY title, vid_date, vid_time) 
        --Then calculate delta over stat 
        THEN viewcount - LAG(viewcount) OVER (ORDER BY title, vid_date, vid_time)
        END AS view_delta, 

        CASE 
        WHEN title = LAG(title) OVER (ORDER BY title, vid_date, vid_time) 
        THEN (vid_date + vid_time) - LAG(vid_date + vid_time) OVER (ORDER BY title, vid_date, vid_time)
        END AS time_delta,
        
        CASE 
        WHEN title = LAG(title) OVER (ORDER BY title, vid_date, vid_time) 
        THEN (commentcount) - LAG(commentcount) OVER (ORDER BY title, vid_date, vid_time)
        END AS commentcount_delta, 
        
        CASE 
        WHEN title = LAG(title) OVER (ORDER BY title, vid_date, vid_time) 
        THEN (likecount) - LAG(likecount) OVER (ORDER BY title, vid_date, vid_time)
        END AS likecount_delta, 
        
        CASE 
        WHEN title = LAG(title) OVER (ORDER BY title, vid_date, vid_time) 
        THEN (dislikecount) - LAG(dislikecount) OVER (ORDER BY title, vid_date, vid_time)
        END AS dislikecount_delta
        
        FROM video_stats
        INNER JOIN playlists ON playlists.resourceid=video_stats.resourceid
)
    --select * from CTE to do final transformations
    SELECT 	*, 
    --calculate time_delta_min to determine watch rate as view_per_min
    ROUND(CAST(EXTRACT(EPOCH FROM time_delta::INTERVAL)/60 AS NUMERIC), 2) AS time_delta_min, 
    ROUND(CAST(view_delta / (EXTRACT(EPOCH FROM time_delta::INTERVAL)/60) AS NUMERIC),2) as view_per_min
    FROM cte
    
    --where title = last_title verifies all deltas calc'd are a difference from the same title. Keeps data clean if anything slips through.
    WHERE title = last_title
    /*where view_delta IS NOT NULL removes all records where a delta wasnt calculated due to title != LAG(title). 
    This occurs ordered records proceed to the next title, so it results in NULL for A_title_views-B_title_views in our cte CASE statements
    */
    AND view_delta IS NOT NULL
) temptable
    """)
    
    sql_return = cur.fetchall()
    columns = [column[0] for column in cur.description]
    sql_df = pd.DataFrame(sql_return, columns = columns)

except Exception as err:
    print ("psycopg2 connect() ERROR:", err)

T2. Build a few new columns from existing data and reset data types

In [8]:
sql_df['hour'] = sql_df['datetime'].dt.hour
sql_df['day'] = sql_df['datetime'].dt.day_name()
sql_df['day_num'] = sql_df['datetime'].dt.weekday
sql_df['view_per_min'] = sql_df['view_per_min'].astype(float)
sql_df.head(5)
Out[8]:
title pl_url position videopublishedat vid_date viewcount last_view view_delta view_per_min datetime ... time_delta_min commentcount commentcount_delta likecount likecount_delta dislikecount dislikecount_delta hour day day_num
0 "11 minut" scena kaskaderska https://www.youtube.com/watch?v=3u2gpoASe3A&li... 125 2014-07-06 17:10:38 2020-07-31 3311964 3311774 190 0.25 2020-07-31 09:02:16 ... 749.58 1634.0 0.0 23870.0 3.0 321.0 0.0 9 Friday 4
1 "11 minut" scena kaskaderska https://www.youtube.com/watch?v=3u2gpoASe3A&li... 125 2014-07-06 17:10:38 2020-07-31 3312086 3311964 122 0.25 2020-07-31 17:12:54 ... 490.63 1634.0 0.0 23870.0 0.0 321.0 0.0 17 Friday 4
2 "11 minut" scena kaskaderska https://www.youtube.com/watch?v=3u2gpoASe3A&li... 125 2014-07-06 17:10:38 2020-07-31 3312128 3312086 42 0.39 2020-07-31 19:00:52 ... 107.97 1634.0 0.0 23871.0 1.0 321.0 0.0 19 Friday 4
3 "11 minut" scena kaskaderska https://www.youtube.com/watch?v=3u2gpoASe3A&li... 125 2014-07-06 17:10:38 2020-07-31 3312173 3312128 45 0.38 2020-07-31 21:00:42 ... 119.83 1634.0 0.0 23871.0 0.0 321.0 0.0 21 Friday 4
4 "11 minut" scena kaskaderska https://www.youtube.com/watch?v=3u2gpoASe3A&li... 125 2014-07-06 17:10:38 2020-07-31 3312211 3312173 38 0.32 2020-07-31 23:00:31 ... 119.82 1634.0 0.0 23870.0 -1.0 321.0 0.0 23 Friday 4

5 rows × 22 columns

Visualize

  • V1. Plot viewcount by position in the playlist
    • The top video is HEYYEYAAEYAAAEYAEYAA with 171M views, ~855,000USD in revenue
    • The second most played is the start of the playlist, YEE, with 78M views ~390,000USD in revenue

V1. Plot viewcount by position in the playlist

  • The top video is HEYYEYAAEYAAAEYAEYAA with 171M views, ~855,000USD in revenue
  • The second most played is the start of the playlist, YEE, with 78M views ~390,000USD in revenue

For example Gangnam Stule had 1B views and made 7.8M USD. Which is 780USD/100,000 views

In [9]:
top_views = sql_df[sql_df['datetime'] == sql_df.sort_values(
    by='datetime', ascending=False).loc[0,'datetime']].sort_values(
    by='viewcount', ascending=False)

fig = px.scatter(top_views, x="position", y="viewcount", color='title')
fig.update_layout(title="Total Views vs Position in Playlist", 
                  showlegend=False)
fig.show()

make_df_url(top_views[:2])
Out[9]:
title pl_url position videopublishedat vid_date viewcount last_view view_delta view_per_min datetime last_time time_delta time_delta_min commentcount commentcount_delta likecount likecount_delta dislikecount dislikecount_delta hour day day_num
27060 HEYYEYAAEYAAAEYAEYAA https://www.youtube.com/watch?v=ZZ5LpwO-An4&list=PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo&index=280 280 2010-11-07 21:01:39 2020-07-31 171078784 171062516 16268 21.700000 2020-07-31 09:02:16 2020-07-30 20:32:41 0 days 12:29:35 749.58 310517.000000 34.000000 2838733.000000 548.000000 84607.000000 5.000000 9 Friday 4
72816 Yee https://www.youtube.com/watch?v=q6EoRBvdVPQ&list=PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo&index=0 0 2012-02-29 19:47:08 2020-07-31 78533770 78525268 8502 11.340000 2020-07-31 09:02:16 2020-07-30 20:32:41 0 days 12:29:35 749.58 59756.000000 14.000000 870279.000000 227.000000 20184.000000 2.000000 9 Friday 4

V2. Generate df with average views per day by video

In [10]:
avg_views_day = sql_df.groupby(by=['vid_date','title']).mean().reset_index()

V3. Plot view_per_min by day to see if people are still watching

  • Our max, "Have you ever had a dream like this", recently averaged over 388 views per min on Aug 4
    • 2,800USD for that day alone.
In [11]:
fig = px.line(avg_views_day, x="vid_date", y="view_per_min", color='title')

fig.update_layout(title="Views/Min of Title by Day",
    legend=dict(
        title="Legend",
        yanchor="top",
        y=-0.7,
        xanchor="center",
        x=0.5
))

fig.show()

findVideo("Have you ever")
Out[11]:
title pl_url position videopublishedat vid_date viewcount last_view view_delta view_per_min datetime last_time time_delta time_delta_min commentcount commentcount_delta likecount likecount_delta dislikecount dislikecount_delta hour day day_num
25337 Have you ever had a dream like this? https://www.youtube.com/watch?v=G7RgN9ijwE4&list=PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo&index=279 279 2011-06-02 10:00:55 2020-08-23 48936659 48929393 7266 60.660000 2020-08-23 13:00:17 2020-08-23 11:00:30 0 days 01:59:47 119.78 nan nan 1045799.000000 330.000000 17465.000000 1.000000 13 Sunday 6
25583 Have you ever had a dream like this? (Metal Remix) https://www.youtube.com/watch?v=dXidW7fEH8g&list=PLFsQleAWXsj_4yDeebiIADdH5FMayBiJo&index=181 181 2013-05-12 00:49:20 2020-08-23 5678588 5678524 64 0.530000 2020-08-23 13:00:17 2020-08-23 11:00:30 0 days 01:59:47 119.78 nan nan 94999.000000 1.000000 1050.000000 0.000000 13 Sunday 6

V4. Plot averages views of the playlist by hour to when people are watching most

  • Looks like more people watch in the afternoon with a peak ay 1900 CST, 7PM CST
    • also, looks like I have some dirty data in here.
    • The script only runs on odd hours but we have entries at 1600 and 1800.
      • We'll need to remove those from our data
In [12]:
fig = px.line(sql_df.groupby(by='hour').mean(), x=sql_df.groupby(by='hour').mean().index, 
              y=['view_per_min'])

fig.update_layout(
    title="Views/Min by Hour of Day",
    xaxis_title="hour of day",
    yaxis_title="view per min",
    legend_title="Legend",
)
fig.show()

V5. Clean up dirty data and plot averages playlist view by day again

  • That looks a lot cleaner and makes more sense for evening views
In [13]:
clean_df = sql_df[sql_df['hour'] % 2 > 0]

fig = px.line(clean_df.groupby(by='hour').mean().reset_index(), 
              x='hour', 
              y=['view_per_min'])

fig.update_layout(
    title="Views/Min by Hour of Day",
    xaxis_title="hour of day",
    yaxis_title="view per min",
    legend_title="Legend",
)
fig.show()

V6. Plot averages views of the playlist by day to when people are watching most

  • Surprisingly more people are watching these videos in the middle of the week, Max = Wednesday, 2.76 view/min
    • They probably have better things to do on the weekends
In [14]:
fig = px.line(clean_df.groupby(by='day').mean().reset_index().sort_values(by='day_num'), 
              x='day', 
              y='view_per_min')
fig.show()

  • 19:00 Tuesday is the max at 4.15 views/min
  • Most days have peaks at around 19:00 except for Friday
  • Weekend views are much lower on average
In [15]:
fig = px.line(clean_df.groupby(by=['day','hour']).mean().reset_index(), 
              x='hour', 
              y=['view_per_min'], color='day')
fig.show()

V8. Lets check our view_per_min/day distributions with a violin plot

  • Since the max daily average is 4 we'll consider all data an order of magnitude greater outliers, so 10 > view_per_min
  • it's also interesting to see that some is negative, likely a data collection issue from youtube viewcount
In [16]:
cols = ['title', 'view_per_min', 'viewcount', 'last_view']
fig = px.violin(clean_df[clean_df['view_per_min']<10].sort_values(by='day_num'), 
                y="view_per_min", x="day", color="day",
          hover_data=sql_df[cols])
fig.show()

clean_df[clean_df['view_per_min']<0][:5]
Out[16]:
title pl_url position videopublishedat vid_date viewcount last_view view_delta view_per_min datetime ... time_delta_min commentcount commentcount_delta likecount likecount_delta dislikecount dislikecount_delta hour day day_num
315 7 Reasons Why You Should Be Listening To Mackl... https://www.youtube.com/watch?v=0nHL7IbKdDQ&li... 283 2013-08-03 01:07:51 2020-08-06 1134565 1134566 -1 -0.01 2020-08-06 09:00:45 ... 119.82 830.0 0.0 17345.0 0.0 921.0 0.0 9 Thursday 3
361 7 Reasons Why You Should Be Listening To Mackl... https://www.youtube.com/watch?v=0nHL7IbKdDQ&li... 283 2013-08-03 01:07:51 2020-08-10 1134641 1134642 -1 -0.01 2020-08-10 09:00:29 ... 119.80 830.0 0.0 17344.0 0.0 921.0 0.0 9 Monday 0
385 7 Reasons Why You Should Be Listening To Mackl... https://www.youtube.com/watch?v=0nHL7IbKdDQ&li... 283 2013-08-03 01:07:51 2020-08-12 1134668 1134669 -1 -0.01 2020-08-12 09:00:38 ... 119.78 830.0 0.0 17344.0 0.0 921.0 0.0 9 Wednesday 2
2283 andrew houston, ceo of dropbox, abuses hookers https://www.youtube.com/watch?v=oXHM9XOigUM&li... 138 2016-11-07 16:01:45 2020-08-06 323109 323111 -2 -0.02 2020-08-06 09:00:45 ... 119.82 119.0 0.0 2171.0 0.0 37.0 0.0 9 Thursday 3
2599 animal calling https://www.youtube.com/watch?v=93rrZabgzMw&li... 75 2019-11-05 19:02:36 2020-08-12 155272 155274 -2 -0.02 2020-08-12 09:00:38 ... 119.78 56.0 0.0 1489.0 0.0 5.0 0.0 9 Wednesday 2

5 rows × 22 columns